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lO . Abstract 

o 

' We demonstrate that the unbounded fan-out gate is very powerful. Constant-depth 

O^l i polynomial-size quantum circuits with bounded fan-in and unbounded fan-out over a fixed 

basis (denoted by QNCf ) can approximate with polynomially small error the following gates: 
d I parity, mod[q], And, Or, majority, threshold[t], exact [t], and Counting. Classically, we need 

logarithmic depth even if we can use unbounded fan-in gates. If we allow arbitrary one-qubit 
gates instead of a fixed basis, then these circuits can also be made exact in log-star depth. 
Sorting, arithmetic operations, phase estimation, and the quantum Fourier transform with 
arbitrary moduli can also be approximated in constant depth. 

> . 

rn . 1 Introduction 

. In this paper, we study the power of shallow quantum chcuits. Long quantum computations 

I encounter various problems with decoherence, hence we want to speed them up as much as 

' possible. We can exploit the following two types of parallelism: 

1: 



1. Gates on different qubits can be applied at the same time. 

2. Commuting gates can be applied to the same qubits at the same time. 



^ , The first approach is just the classical parallel computation. The second approach only 

qh| makes sense when the gates applied on the same qubits commute, i.e. AB = BA, otherwise 

the outcome would be ambiguous. Being able to do this is a strong assumption, however there 
are models of quantum computers, in which it is physically feasible: ion-trap computers |CZ95j 
r> I and bulk-spin resonance (NMR) |(T(]97j . The basic idea is that if two quantum gates com- 

■ mute, so do their Hamiltonians and therefore we can apply their joint operation by performing 

both evolutions at the same time. This type of research started after the M0lmer-S0rensen 
paper |MS99j . Recently, a Hamiltonian implementing the fan-out gate (which is crucial for all 
our simulations) has been proposed by Fenner |Fenn3j . 

In our paper, we investigate how much the power of quantum computation would increase 
if we allow such commuting gates. The computation in the stronger model must be efficient, 
therefore we do not require the ability to perform any set of commuting gates. This is in 
accordance with standard quantum computation, where we also allow only some gates. We 
choose a representative, the so-called unbounded fan-out gate, which is a sequence of controlled- 
not gates sharing one control qubit. We call it fan-out, because if all target qubits are zero, 

* Supported by Canada's NSERC and the Canadian Institute for Advanced Research (CIAR). 
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project QAIP, IST-1999-11234 and RESQ, IST-2001-37559. 
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then the gate copies the classical source bit into n copies. We show that fan-out is in some 
sense universal for all sets of commuting gates. In particular, the joint operation of any set 
of commuting gates (that can be easily diagonalised) can be simulated by a constant-depth 
quantum circuit using just one-qubit and fan-out gates. To achieve this, we generalise the 
parallelisation method of MN02, GHMP(2| and adapt it to the constant-depth setting. 

We state our results in terms of circuit complexity classes. Classically, the main classes 
computed by constant-depth, polynomial-size circuits are: 

NC'^ with Not and bounded fan- in gates: And, Or, 
AC" with Not and unbounded fan-in gates: And, Or, 

TC" with Not and unbounded fan- in gates: And, Or, threshold [t] for all t, 
AC''[g] with Not and unbounded fan-in gates: And, Or, mod[q], 
ACCO = U,AC°[g]. 

The zero in the exponent means constant depth, in general NC'^ means (log'^ n)-depth circuits. 
Several separations between these classes are known. Razborov |Raz87j proved that TC*^ is 
strictly more powerful than ACC*^. Using algebraic methods, Smolensky |Smo87j proved that 
AC'^[g] 7^ AC'^[g'], where q,q' are powers of distinct primes. In other words, threshold gates 
cannot be simulated by constant-depth circuits with unbounded fan-in Or gates, and mod[q] 
gates do not simulate each other. 

The main quantum circuit classes corresponding to the classical classes are QNC*^, QAC*^, 
QTC*^, and QACC*^. We use subscript 'f to indicate circuits where we allow the fan-out gate (e.g. 
QNCf ). Classically, fan-out (copying the result of one gate into inputs of other gates) is taken 
for granted. Surprisingly, in contrast to the classical case, some of the quantum circuit classes 
are the same. Moore |Moo99j proved that parity is equivalent to fan-out, i.e. QACf = QAC''[2]. 
Green et al. |(-rHMPn2] proved that allowing mod[q] gates with different moduli always leads 
to the same quantum classes, i.e. QACC^ = QAC'^[g] for every integer q >2. 

In this paper, we extend these results and show that even exact [t] gates (which output 
1 if the input is of Hamming weight t, and otherwise) can be approximated with poly- 
nomially small error by fan-out and single qubit gates in constant depth. Our simulations 
have polynomially small error. Since exact[t] gates can simulate And, Or, threshold[t], and 
mod[q] gates, we conclude that the bounded-error versions of the following classes are equal: 
B-QNC^ = B-QAC[' = B-QTC?. The exact [t] gate can be approximated in constant depth 
thanks to the parallelisation method. However, the simulation is not so straightforward as for 
mod[q] in |(THMPn2j and it works only with high probability. 

We then introduce a so-called Or-reduction that converts n input bits x into log n output bits 
y and preserves the Or function, i.e. x is nonzero if and only y is. We show how to implement it 
exactly in constant depth and use it to achieve exact computation of Or and exact [t] in log-star 
depth. (Circuits of log-star depth are defined in Sectional) We also apply the Or-reduction to 
decrease the size of most of our circuits. 

Our results concerning the threshold [t] gate have several interesting implications. Siu et 
al. |SBKH93j proved that sorting and integer arithmetic (addition and multiplication of n in- 
tegers, and division with remainder) are computable by constant-depth threshold circuits. It 
follows that all of them can be approximated in B-QNCf . 

The last contribution of our paper concerns the quantum Fourier Transform (QFT). Cleve 
and Watrous [(^WOOj published an elegant log-depth quantum circuit that approximates the 
QFT. By optimising their methods to use the fan-out gate, we can approximate the QFT in 
constant depth with polynomially small error. First, we develop a circuit for the QFT with 
respect to a power-of-2 modulus, and then, using a technique of HH99], we show that the QFT 
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with respect to arbitrary moduli can be approximated too. Hence the QFT is in B-QNC^. The 
QFT has many apphcations, one of which is the phase estimation of an unknown quantum state. 

Shor's original algorithm for factoring |Sho94j uses the QFT and the modular exponenti- 
ation. Cleve and Watrous [("WOflij have shown that it can be adapted to use modular mul- 
tiplication of n integers. Since we prove that both the QFT and arithmetic operations are in 
B-QNCf , polynomial-time bounded-error algorithms with oracle B-QNCf can factorise numbers 
and compute discrete logarithms. We can make the following conclusions: First, if B-QNCf can 
be simulated by a BPP machine, then factoring can be done in polynomial time by bounded- 
error Turing machines. Second, since it unlikely that BQP = B-QNC^, factoring and discrete 
logarithms are likely not the hardest things quantum computers can do. 

2 Quantum circuits with unbounded fan-out 

Quantum circuits resemble classical reversible circuits. A quantum circuit is a sequence of 
quantum gates ordered into layers. The gates are consecutively applied in accordance with the 
order of the layers. Gates in one layer can be applied in parallel. The size of a gate is the 
number of affected qubits. The depth of a circuit is the number of layers and the size is the 
total size of all its gates. A circuit can solve problems of a fixed input size, so we define families 
of circuits containing one circuit for every input size. We consider only uniform families, whose 
description can be generated by a log-space Turing machine. 

A quantum gate is a unitary operator applied to some subset of qubits. We usually use 
gates from a fixed universal basis (Hadamard gate, rotation by an irrational multiple of vr, and 
the controlled-not gate) that can approximate any quantum gate with good precision |ADH97j . 
The qubits are divided into 2 groups: Input/output qubits contain the description of the input 
at the beginning and they are measured in the computational basis at the end. Ancilla qubits 
are initialised to |0) at the beginning and the circuits usually clean them at the end, so that 
the output qubits are in a pure state and the ancillas may be reused. 

Since unitary evolution is reversible, every operation can be undone. Running the compu- 
tation backward is called uncomputation and is often used for cleaning ancilla qubits. 

2.1 Definition of quantum gates 

Quantum circuits cannot use a naive quantum fan-out gate mapping every quantum superposi- 
tion |</))|0) . . . |0) to . . . due to the no-cloning theorem |WZ82j . Such a gate is not linear, 
let alone unitary. Instead, our fan-out gate copies only classical bits and the effect on superposi- 
tions is determined by linearity. It acts as a controlled-not-. . .-not gate, i.e. it is an unbounded 
sequence of controlled-not gates sharing one control qubit. Parity is a natural counterpart of 
fan-out. It is an unbounded sequence of controlled-not gates sharing one target qubit. 

Definition 1 The fan-out gate maps \yi) . . . \yn)\x) \yi (B x) . . . \yn © x)\x), where x (B y = 
(x -|- y) mod 2. The parity gate maps \xi) . . . \xn)\y) \xi) . . . \xn)\y © (xi © ... © x^)). 

Example. As used in |Moo99j . parity and fan-out can simulate each other in constant depth. 

The Hadamard gate is H = |^ and it holds that = I. If a controlled-not gate is 

preceded and succeeded by Hadamard gates on both qubits, it just turns around. Since parity 
is a sequence of controlled-not gates, we can turn around all of them in parallel. The circuit is 
shown in Figure ^ 
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Figure 1: Equivalence of parity and fan-out 
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Figure 2: Implementing an arbitrary controlled one qubit gate 



In this paper, we investigate the circuit complexity of, among others, these gates: 

Definition 2 Let and let \x\ denote the Hamming weight of x. The following 

(n + l)-qubit gates map \x)\y) \x)\y (B g{x)) , where g{x) = 1 iff 

\x\ > 0: Or, |x| = n: And (Toffoli), > ^: majority, 

|x| modg = 0: mod[q], \x\ > t: threshold[t], |x| = t: exact[t], 

A counting gate is any gate that maps \x)\0'^) — > \x) \ \x\ ) for m = [log(n + 1)] . 



2.2 Quantum circuit classes 

Definition 3 QNCf(d(n)) contains operators computed exactly (i.e. without error) by uniform 
families of quantum circuits with fan-out of depth 0{d{n)), polynomial size, and over a fixed 
basis. QNCf = QNCf(log^n). R-QNCf contains operators approximated with one-sided, and 
B-QNCf with two-sided, polynomially small error. 

Remark. The circuits below are over a fixed universal basis, unless explicitly mentioned oth- 
erwise. Some of our circuits need arbitrary one-qubit gates to be exact. For simplicity, we 
sometimes include several fixed-size gates (e.g. the binary Or gate and controlled one-qubit 
gates) in our set of basis gates. This inclusion does not influence the asymptotic depth of our 
circuits, since every s-qubit quantum gate can be decomposed into a sequence of one-qubit and 
controlled-not gates of length 0(5^4"*) jBBC+95 . 

For every one-qubit gate U, there exist one-qubit gates A, B, C and a rotation P = (a) 
such that the controlled gate U is computed by the constant-depth circuit shown in Figure 121 
[BBC"*"95l Lemma 5.1]. If a qubit controls more one-qubit gates, then we can still use this 
method in constant depth. We just replace the controlled-not gate by the fan-out gate and the 
rotations P are multiplied. 



3 Parallelisation method 

In this section, we describe a general parallelisation method for achieving very shallow circuits. 
We then apply it to the rotation by Hamming weight and the rotation by value, and show how 
to compute them in constant depth. 
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Figure 3: A serial circuit with interpolated basis changes 



k\ 
|0) 

|0) 



T 



Vi 



V2 



Tt 



|0) 



|0) 



Figure 4: A parallelised circuit performing U = T'^ {X\a=i V^^)T = HiLi ^i'' 



3.1 General method 

The unbounded fan-out gate is universal for commuting gates in the following sense: Using 
fan-out, gates can be applied to the same qubits at the same time whenever (1) they commute, 
(2) we know the basis in which they all are diagonal, and (3) we can efficiently change into the 
basis. The method reduces the depth, but may in general require the use of ancilla qubits. 



Lemma 1 |H.)85L Theorem 1.3.19] For every set of pairwise commuting unitary gates, there 
exists an orthogonal basis in which all the gates are diagonal. 



Theorem 2 |MNn2t r(lHMPn2'j Let {Ui}f^i be pairwise commuting gates on k qubits. Gate 
Ui is controlled by qubit \xi). Let T be a gate changing the basis according to Lemma^ There 
exists a quantum circuit with fan- out computing U = Yll^i C^f ' having depth max"^-,^ depth(f/j)-|- 
4 • depth(r) -|- 2, size "^^^i size(C/j) -|- (2n -|- 2) • size(r) -|- 2n, and using (n — l)k ancillas. 

Proof. Consider a circuit that applies all Ui sequentially. Put TT^ = L between Ui and 
Ui+i. The circuit is shown in Figure |21 Take Vi = T^UiT as new gates. They are diagonal 
in the computational basis, hence they just impose some phase shifts. Multiple phase shifts 
on entangled states multiply, so can be applied in parallel. We use fan-out gates twice: first 
to create n entangled copies of target qubits and then to destroy the entanglement. The final 
circuit with the desired parameters is shown in Figure^ □ 



Example. As used in |Moo99j . it is simple to prove that mod[q] S QNCj. Each input qubit 
controls one increment modulo g on a counter initialised to 0. At the end, we obtain |x| modg'. 
The modular increments commute and thus can be parallelised. Since q is fixed, changing the 
basis and the increment can both be done in constant depth. 
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Figure 5: Rotation by Hamming weight and value 



3.2 Rotation by Hamming weight and value 

In this paper, we often use a rotation by Hamming weight {'■p\x\) and a rotation by value 
R2 i^x), where (a) is one-qubit rotation around the z-axis by angle a: R^ (a) = |0)(0| + 
e*"|l)(l|. They can both be computed in constant depth. 

Lemma 3 For every angle ip, there exist constant- depth, linear-size quantum circuits with fan- 
out computing R^ (<^|x|) and R^ {^x) on input x = Xn-i ■ ■ ■ xiXq. 

Proof. The left circuit in Figure |S1 shows how to compute the rotation by Hamming weight. 
Each input qubit controls Rz{^) on the target qubit, hence the total angle is (p\x\. These 
controlled rotations are parallelised using the parallelisation method. The right circuit shows 
the rotation by value. It is similar to the rotation by Hamming weight, only the input qubit 
\xj) controls R^ (<^2''), hence the total angle is ^Yl^=o ^"'^j = V'^- '-' 

Remark. The construction uses rotations R^ (f) for arbitrary G M. However, we are only 
allowed to use a fixed set of one-qubit gates. It is easy to see that every rotation can be 
approximated with polynomially small error by R^ {6q) = {R^ {0)Y , where sin0 = | and g is a 
polynomially large integer |ADH97j . These q rotations commute, so can be applied in parallel 
and the depth is preserved. The approximation can be kept down to polynomially small error 
while increasing the size of the circuit only polynomially. 



4 Constant-depth approximate circuits 
4.1 Or gate 

It is easy to see that the rotation by Hamming weight of a string y of length m with angle 
= ^ can be used to distinguish the zero string y = 0™ from strings with approximately ^ 
ones. We, however, want to distinguish the zero string from all nonzero strings. It turns out 
that if we compute m = O(nlogn) rotations by Hamming weight of the input x with angles 
distributed evenly around the circle, we obtain a string y that is either zero (for x = 0*^), or has 
expected Hamming weight y (for x 7^ 0"). By combining these two results, we can approximate 
the Or gate and, with a minor modification, also the exact [t] gate in constant depth. 



6 



Let 10 G No and let be an angle. Define a notation for the following one-qubit state: 



1 _|_ iipw 1 _ iipw 

\^,Z) = {H.R^ M • H) |0) = ^^|0) + 



(1) 



By Lemma 131 |a*'(^') can be computed in constant depth and linear size. 

Theorem 4 Or G R-QNCf . In particular, Or can be approximated with one-sided error ^ in 
constant depth and size O (n^ log n) . 

Proof. Let n denote the size of the input x. Let m = a-n, where a will be chosen later. For all 
/c G {0, 1, . . . , m — 1}, compute in parallel \yk) = l^'^i) for angle ifk = ^k. If \yk) is measured 
in the computational basis, the expected value of the outcome 1^ G {0, 1} is 



E[Yk] 



If all these m qubits \y) are measured, the expected Hamming weight of all Y's is 



E\\Y\]=E 



'm—l 
.k=0 



m 1 / 2iTk 

> cos 

2 2 ^ V ' 



if|x|=0, 
f if|x|/0. 



The qubits \y) are actually not measured, but their Hamming weight \y\ controls another rota- 
tion on a new ancilla qubit \z). So compute \z) = \fJ'^2i/m)- ^ outcome after \z) is 
measured. If |y| = 0, then Z = with certainty. If — y| < then 



P[Z = 0] 



1 + e' 



2-2 [n 



Assume that |x| ^ 0. We want to upper-bound the probability of the bad event that \Y\ 
is not close to y. Since < 1^ < 1, we can use Hoeffding's Lemma [S] below and obtain 
P[||y| - f I > em] < Fix a = logn and e = Now, P[\\y\ - f I > ^] < = 

^ = ^. The probability that we observe the incorrect result Z = is at most the sum of the 
probabilities of the two bad events, i.e. O(^). Hence 



P[Z = 0] 



1 

0( 



if |x| = 0, 
if |x| / 0. 



The circuit has constant depth and size 0(mn) = O(n^logn). It is outlined in Figure El The 
figure is slightly simplified: unimportant qubits and uncomputation of ancillas are omitted. □ 



Lemma 5 (Hoeffding |Hoe63p If Yi, . . . ,Ym are independent random variables bounded by 
o-k ^bk, then, for all e > 0, 

P [\S - E[S]\ > em] < 2exp ^, where S = YT=k Yk- 
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Figure 6: Constant depth circuit approximating Or 

Remark. Since the outcome z of the circuit in Figure El is a classical bit, we can save it in an 
ancilla qubit by applying a controlled- not gate and clean \y) by uncomputation. It remains to 
prove that the intermediate qubits \y) need not be measured, in order to be able to uncompute 
them. We show above that the output qubit is a good approximation of the logical Or, provided 
\y) is immediately measured. By the principle of deferred measurement, we can use controlled 
quantum operations and measure \y) at the end. However, the output bit is close to a classical 
bit (the distance depends on the error of the computation), thus it is only slightly entangled 
with and hence it does not matter whether \y) is measured. 

Definition 4 Let log^'^^ x denote the k-times iterated logarithm log log . . . log x. The log-star 
function, log* x, is the maximum number of iterations k such that log^^^ x exists and is real} 

Remark. If we require error we create c copies and compute the exact Or of them by a 
binary tree of Or gates. The tree has depth logc = 0(1). In Section l6.ll we show how to 
approximate Or in constant depth and size 0(nlog'''^^ n) for any constant k. In Section [6.21 we 
show how to compute Or exactly in log-star depth and linear size. 

4.2 Exact[t] and threshold[t] gates 
Theorem 6 exact[t] £ R-QNC|?. 

Proof. We slightly modify the circuit for Or. As outlined in Figured by adding the rotation 
i?z {—ift) to the rotation by Hamming weight in the first layer, we obtain *) instead of 
l/i',^'). The second layer stays the same. If the output qubit \z) is measured, then 

We obtain an approximation of the exact[t] gate with one-sided polynomially small error. □ 

Remark. Other gates are computed from the exact [t] gate by standard methods. For example, 
threshold[t] can be computed as the parity of exact[t], exact[t-|-l], . . . , exact[n]. The depth stays 
constant and the size is just n-times bigger, i.e. O(n^logn), hence threshold[t] G B-QNCj . In 
Section Eini we show how to approximate exact [t], threshold [t], and counting in constant depth 
and size O(nlogn). 

^The log-star of the estimated number of atoms in the universe is 5. Consequently, for the computational 
problems we consider in this paper, the log-star is in practice at most 5. 
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Figure 7: Rotation by Hamming weight with added rotation 
4.3 Arithmetic operations 

Using threshold gates, one can do arithmetic operations in constant depth. The following 
circuits take as part of the input an ancilla register in state |0) and output the result of the 
computation in that register. 

Theorem 7 The following functions are mB-QNCf .• addition and multiplication ofn integers, 
division of integers with remainder, and sorting of n integers. 



Proof. By |SBKH93] . these functions are computed by constant-depth,^ polynomial-size thresh- 
old circuits. A threshold circuit is built of weighted threshold gates. It is simple to prove that 
the weighted threshold gate (with polynomially large integer weights) also is in B-QNC^. One 
only needs to rotate the phase of the quantum state in Lemma 01 by integer multiples of the 
basic angle. □ 



In the following section, we require a reversible version of modular addition. 
Definition 5 Let q be an n-bit integer and xi, . . . , G Zg. The reversible addition gate maps 



add'^ 



\Q)\xi) 



\q)\xi) . . . \xm-i)\y), where y = (Yl'lLiXi) modq. 



Lemma 8 add™ G B-QNC?. 

Proof. By Theorem y = {'^^i Xi) mod q can be approximated in constant depth and 
polynomial size. The result is, however, stored into ancilla qubits. Hence we have to erase Xm, 
which we may achieve by first negating the contents in y by \y) — > | — y), computing the sum 
w = y + Y17^^ -^i ™ ^ fresh ancilla, do a bitwise control-not of w into Xm., uncompute w, and 
finally re- negate y. We then swap the ancillas \y) with the erased qubits in \xm)- Q 



The depths are reaUy smaU, from 2 to 5. 
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4.4 Quantum Fourier transform 

The QFT is a very powerful tool used in several quantum algorithms, e.g. factoring of integers 
and computing the discrete logarithm |Sho94j . 

Definition 6 The quantum Fourier transform with respect to modulus q performs the Fourier 
transform on the quantum amplitudes of the state, i.e. it maps 

1 

Fg : \x) iV'x) = — ^ w^^ly), where uj = e^^*/?, (2) 
for X £ {0,1, . . . ,q — 1} and it behaves arbitrarily on the other states. 



4.4.1 QFT with a power-of-2 modulus 

Let 5 = 2". Coppersmith has shown in |Cop94| how to compute the QFT in quadratic depth, 
quadratic size, and without ancillas. The depth has further been improved to linear [folklore]. 
Cleve and Watrous have shown in |( ' WDOj that the QFT can be approximated with error e in 
depth 0(log n + log log and size 0(nlog j). They also show that if only gates acting on a 
constant number of qubits are allowed (in particular, the fan-out gate is not allowed), logarithmic 
depth is necessary. We show that the approximate circuit for the QFT from |CW00j can be 
compressed to constant depth, if we allow the fan-out gate. 



Theorem 9 QFT £ B-QNCl'. 



Proof. The operator : l^;) — > \ipx) can be computed by composing: 

1. Fourier state construction (QFS): |x)|0) . . . |0) \x)\tpx)\0) • • • |0) 

2. Copying Fourier state (COPY): |x)|V'x)|0) . . . |0) \x)\ipx) . . . Itp^) 

3. Uncomputing phase estimation (QFP): 1^^,) . . . \il>x)\x) {ipx) ■ ■ ■ |V'x)|0) 

4. Uncomputing COPY: IV'^) . . . |V'x)|0) ^ |Vx)|0) . . . |0) 

The following lemmas show that each of these individual operators is in B-QNC?. □ 



Lemma 10 QFS G QNC?. 

Proof. QFS maps |x)|0) \x)\'il^x)- Define \pr) = ^^^'^'^^ ■ It is simple to prove that 

\i^x) = \Px/2^)\Px/2^) ■ ■ ■ \Px/2")- 



2"-l , 2"-l n 



on ^ — ^ ./O' 

y=0 ^ y=0 k=l 



^ n 1 " ln\ I 27rja;/2* 1 1 \ " 



V ^ k=l 6=0 k=l ^ k=l 

The n qubits |/0^/2'=) can be computed from x in parallel as follows: \Px/2'') = (ff^;) ^'^^^^^ 
is computed by the rotation by value (Lemma in constant depth and linear size. □ 
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Figure 8: Measurement of \Px/2'^) ^ random basis 
Lemma 11 COPY e B-QNC^*. 

Proof. COPY maps |V'a;)|0) ■ • • |0) — > \il:x) ■ ■ ■ \ipx)- Take the reversible addition gate modulo 
2": (add2n)|i/)|x) = \y)\{x + y) mod2"') . It is simple to prove that add~"^ \'^y)\ipx) = \'4^x+y)\ipx) ■ 

2" — l 

l,k=0 l,k 
l,m l,m 

Hence a,dd~^ \ilJo)\ipx) = The state |^o) = -?^®"|0") is easy to prepare in constant 

depth. Furthermore, (add^i)~^|^/'o) • • • |^o)|V'x') = l-f/^x) ••• iV'x) iV'x)) because the addition of jn— 1 
numbers into one register is equivalent to m — 1 consecutive additions of one number. Each 
such a reversible addition copies \ipx) into 1 register. Note that the add^ gate performs all 
these additions in parallel. By Lemma |H1 the reversible addition gate is in B-QNC?. □ 

Lemma 12 QFP G B-QNC^. 

Proof. QFP maps \iJx) ■ ■ ■ \i^x)\0) \ipx) ■ ■ ■ \'^x)\x) . By Cleve and Watrous jCWnni Section 
3.3], we can compute x with probability at least 1 — e from O(log^) copies of \ipx) in depth 
0(logn + loglog and size 0(nlog j). Use e = poiy(^n) • simple to convert their circuit 
into constant depth, provided we have fan-out. The details are sketched below. 

The input consists of m = 0(log ^) copies of iV'x) = \Px/2^)\Px/2'^) ■ ■ ■ \Px/2'^)- Measure each 
\Px/2'') Y times in the basis {|/0o.oi), Ipo.ii)} and y times in the Hadamard basis {|po.oo)) Ipo.io)}- 
The state \Px/2k) = TjdO) + e2'^*(°-^'=-i-^i^o)) lies on the middle circle of the Bloch sphere; it 
is shown in Figure |HI If \Px/2'') is in the white region, then the measurement in the first basis 
tells whether x^-i = or 1 with probability at least |. If \Px/2'') is in the shaded region, then 
the measurement in the Hadamard basis tells whether x^.i = Xfc„2 or ~^Xk-2 (denoted by P, 
N) with probability at least |. 

For each fc, perform the majority vote and obtain the correct answer G {0, 1,P,N} with 
error probability at most 2^ = f • The probability of having any error is at most n times 
bigger, i.e. at most e. Compute Xn-i ■ ■ ■ xiXq from . . . ziZq in constant depth. The bit Xk 
is computed as follows: 

1. If ZkZk-i ■ ■ ■ ^z+i G {P,N} and zi G {0, 1}, compute the parity of the number of N's and 
add it to zi (assuming z-i = 0), otherwise return 0. 
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2. Check and compute ah prefixes / in parallel and take the logical Or of the results. 
All the gates used (fan-out, parity, And, Or, majority) are in B-QNCj?. □ 

4.4.2 QFT with an arbitrary modulus 

Let g / 2". Cleve and Watrous have shown in |CW00j that the QFT can be approximated with 
error e in depth O ((log log g) (log log ^)) and size poly(log g + log ^). We show that their circuit 
can also be compressed into constant depth, if we use the fan-out gate. The relation between 
quantum Fourier transforms with different moduli was described in |HH99j . 

Remark. We actually implement a slightly more general operation, when q is not a fixed 
constant, but an n-bit input number. This generalised QFT maps \q)\x) \'l)\ipx)- The 
register \q) is implicitly included in all operations. We will henceforth omit it and the generalised 
operations are denoted simply by QFTg, QFSg, COPY™, and QFPg. 

Theorem 13 QFTg G B-QNC^*. 

Proof. Let jdummy^ denote an unspecified quantum state depending on two parameters 
q,x. The operator Fg : \x) — > |^/;2,)|dummyg q) can be computed by composing: 

1. QFSg: \x) |x)|V'x)|dummyg^^) 

2. C0PY™+1: ^|x)|Vx)|dummy^,J(|V^,j|dummy,^o))®" 

3. Uncomputing QFSg: — > |x) (|V'x)|dummyg g))'^™' 

4. Uncomputing QFPq: — > (|?/;^)|dummyg o))'^'" 

5. Uncomputing COPY^: |'i/'a;)|dummyg g), 

where empty registers are omitted for clarity. The state jdummy^ g) is not entangled with \x) 
and hence it can be traced out. We obtain the quantum Fourier transform Fg. The following 
lemmas show that each of these individual operators is in B-QNC?. □ 

Lemma 14 QFSg £ B-QNC?. 

Proof. QFSg maps |x)|0) |x)|'(/'a;)|dummyg .j,) for some "garbage" state Idummy^ ,^,). We will 
show that QFSg is well approximated by a QFS with a power-of-2 modulus of the magnitude 
g^. Let n = [log g] . Take N = 3n and extend x by leading zeroes into N bits. Using Lemma ITIH 

perform QFS2JV and obtain the state \x)-^^Yly=o 1^)- 

Set u = [2^/gJ and apply integer division by u to the second register, i.e. map \y) — > \yi) \y2), 
where yi = [y/u\ G {0, 1,. . . ,q} and y2 = ymodu. This can be done reversibly in constant 
depth by a few applications of Theorem [3 using the method from Lemma |H1 The quantum state 
can be written as 



, 2^^-! , g-1 u-1 



^ ?;=0 " ^ j/i=0 2/2=0 



\w) 



where \w) = -^^'^l=Qe^'^^^''^~^^'^\q)\z) and v = 2^modM = 2^ — qu = 2^ m.odq < q. The 

sum has been rearranged using y = yiu + 2/2- Now, || \w) ||= = 0(2~") is exponentially 

small and so it can be neglected. Decompose the quantum state into the tensor product 

u-l 
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Now, u is exponentially close to because = "^H^ = 1 - 0(2~2"-). Since ^ = 0(2"), 
the replacement of by 1 in the exponent causes only exponentially small error 0(2^"). Hence 
the quantum state is exponentially close to 

— Ve~''^|y) ^ Ve^^'^lz) = |V'x)|dummy ). 

The "garbage" state |dummyg ,j,) arises as a byproduct of the higher precision 3n-bit arithmetic. 
We clean it up later by uncomputing QFSg after copying IV'a;); see the proof of Theorem It 
actually gets replaced by jdummy^ o) = Z^2=o 1-^)' which does not depend on x and it thus 
causes no harm. We have approximated QFSg in constant depth. □ 



Lemma 15 COPY^ £ B-QNC^. 

Proof. COPY^ maps |Vx)|0) . . . |0) |V'a;>(|V'x)|dummy^ o))®^""^^- The proof is similar 
to the proof of Lemma ^2 First, prepare m — 1 states IV'o) |dummy^ g) by applying QFSg 
to |0)|0) fLemma I14() . Second, use the inverse of the reversible addition modulo q to map 
(add^)-i : l^o) ■ • • mii^x) ^ l^x) . . . \i^x)\ipx) (LemmalHl). □ 



Lemma 16 QFPg £ B-QNC^. 

Proof. QFPg maps \^x) ■ ■ ■ |V'x)|0) \tPx) ■ ■ ■ \ipx)\x). We use an idea similar to the proof of 
Lemma Let n = [logg] and = 3n. Extend \ipx) by leading zeroes to bits and apply 
F^ff to them (Theorem IHI) . We obtain many copies of the state 

The exponent can be rewritten to 27rz(| — ^) -y. Intuitively, if jz — 2^|| < then || — ^\ < 
the absolute value of the angle in the exponent is at most j for every y G {0, 1, . . . , g — 1}, 
and the amplitudes sum up constructively. If z is not close to 2^ | , then the amplitudes interfere 
destructively. The quantum state has most of its amplitude on the good z's. So we compute 
reversibly by division with remainder an estimate x' = + ■ A detailed analyzis shows 
that P\x' = X ] > i + 5 for some constant 5 |CWOOL lHH99j . Here we do not present the details, 
because our goal is the compression of the circuit from |CW00j into constant depth. 

We transform all m = O(log^) input quantum states {ipx) into m independent estimates 
\x'). We then estimate all bits of x one- by-one from these m estimates by majority gates. Each 
bit of X is wrong with probability at most 2"*" = 2~^°^^ = ^. The probability of having an 
error among the n bits of x is thus at most e. Finally, save the estimation of x in the target 
register and uncompute the divisions and the quantum Fourier transforms. With probability at 
least 1 — £, the mapping QFPg has been performed. Use e = ^ . □ 



E E 



e 2 
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4.5 Quantum phase estimation 

The method of computing QFT2n can be also used for phase estimation. 

Theorem 17 Given a gate Sx ■ \y)\4>) ~^ \y)Rz (^rry) \<P) for basis states \y), where x G Z2" is 
unknown, we can determine x with probability at least 1 — e in constant depth, size 0(nlog j), 
and using the Sx gate O(nlog^) times. 

Proof. Obtain an estimate of x by applying the QFP to 0(log -) copies of the quantum state 
iV'x) = \Px/2'^)\Px/2'2) ■ ■ ■\Px/2")- Each \Px/2'') cau be computed by one application of to 
|2n-.) KMl) ^ because \px/,.) = R. (^) = R. (|^2«-^) Ml. □ 



5 Exact circuits of small depth 

In the previous section, we have shown how to approximate the exact [t] gate in constant depth. 
In this section, we show how to compute it exactly in log-star depth. The circuits in this section 
use arbitrary one-qubit gates instead of a fixed basis, otherwise they would not be exact. 

Lemma 18 The function Or on n qubits can be reduced exactly to Or on m = [log(n + 1)] 
qubits in constant depth and size O(nlogn). 

Proof. We use a technique similar to the proof of Theorem |3| Recall the quantum state \p^) 
defined by equation ^ on pagelTj For /c G {1,2, . . . ,m}, compute in parallel \yk) = for 
angle ipk = |f. Let \y) = \y1y2 ■■■ym)- 

• If |x| = 0, then (y|0™) = 1, because \yk) = |0) for each k. 

• If |x| / 0, then (ylO"^) = 0, because at least one qubit is one with certainty. Take 
the unique decomposition of into a product of a power of 2 and an odd number: 
\x\ = 2°-{2h + 1) for a, 6 G Nq. Then 

1 _ (.iVa+l\x\ I _ gi^2"(2fe+l) ^ _ gi7r(26+l) ^ _ givr 

(i|ya+i) = ^ = ^ = ^ = ^— = 1. 

It follows that X is non-zero if and only if y is. Hence the original problem is exactly reduced 
to a problem of logarithmic size. □ 



Theorem 19 exactft] G QAC^. 

Proof. Using the methods from Theorem El and Lemma [T8| exact [t] can also be reduced to 
Or of logarithmic size. The reduction has constant depth and size O(nlogn). Hence exact [t] is 
QNCf -reducible to Or, or simply exact [t] G QACf, because QACf includes both QNCf and the 
Or gate. □ 



Theorem 20 exact[t] G QNCf(log*n), i.e. exact [t] can be computed exactly in log-star depth 
and .size O(nlogn). 

Proof. Apply the reduction used in Lemma ITSl in total (log* n)-times, until the input size is at 
most 2. Compute and save the outcome, and clean ancillas by uncomputation. The circuit size 
is O(nlogn). □ 
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6 Circuits of small size 



In this section, we decrease the size of some circuits. We ahow the use of arbitrary one-qubit 
gates instead of a fixed basis. 

6.1 Constant depth approximation of Or 

In this section, we apply the reduction from Lemma ITHl repeatedly to shrink the circuit for Or. 
We first reduce the size of the circuit to O(nlogn). We then develop a recurrent method that 
reduces the size even further. Let us define a useful notation. 

Definition 7 Let x = xiX2 . . . x„. By Or-reduction n ^ m with error e we mean a quantum 
circuit mapping |x)|0™') \x)\f) such that, if \x\ = 0, then \ip) = |0"^) and, if \x\ ^ 0, then 

The Or-reduction preserves the logical Or of qubits, i.e. \x\ = iS \(p\ =0 with high 
probability. Theorem |3] provides an Or-reduction n ^ 1 with error ^, constant depth, and size 
n^logn. Lemma ^1 provides an Or-reduction n logn with error 0, constant depth, and size 
n log n. 

Lemma 21 There is an Or-reduction n — > 1 with error ^, constant depth, and size nlogn. 

Proof. Divide the input into blocks of size ^/nlog n. First, reduce each block by LemmalTHl 
to ^ log n + log log n = O(logn) qubits in constant depth and size -^/nlog^ n. In total, we obtain 
■v/n new qubits in size nlogn. Second, compute the logical Or by Theorem0]in constant depth, 

2 /~~ 1 1 

size logy^ = O (nlogn), and error To amplify the error to -, repeat the computation 

twice and return 1 if any of them returns 1 (the error is one-sided). The circuit size is doubled. 

□ 

The circuit size can be reduced to 0(nlog^'^^ n) for any constant number d of iterations of 
the logarithm. The trick is to divide input qubits into small blocks and perform the reduction 
step on each of them. The number of variables is reduced by a small factor and we can thus 
afford to apply a circuit of a slightly bigger size. It we repeat this reduction step d times, we 
obtain the desired circuit. 

Theorem 22 There exist constants ci,C2 such that for every d G N, there is an Or-reduction 
n — > 1 with error ^, depth cid, and size C2dnlog^'^^ n. 

Proof. By induction on d: we have already verified the case d = 1 in Lemma |^ For the 
induction step: Divide n input qubits into n/log^^~^^n blocks of log'''^^^^ n qubits. Using 
Lemma ^1 reduce each block to log^'^^ n qubits in constant depth and size C2 

log('^-i) n ■ log^'') n. 

Total size is C2nlog^'^^n. We obtain ^^^(Jli) ^ log^'^^ n = o(n) new qubits. Using the induction 
hypothesis, compute their logical Or in depth ci{d — 1) and size C2{d — 1) ( - — (Jl^ log^'^-' n) • 

log('^~^) o(n) < C2{d — l)nlog*-'^^ n. Together, it takes depth cid and size C2(in log'-'^^ n. 

The only approximate step is the final application of Lemma 1211 for d = 1. It is applied on 
log^"^) n variables, hence the error is 0(log n/n). It can be amplified to ^ by running the 
computation twice. □ 
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6.2 Log-star depth computation of Or 

Our best constant-depth circuit for Or is described by Theorem 1221 It is approximate and it has 
shghtly super-hnear size. In this section, we show that we can achieve an exact circuit of linear 
size if we relax the restriction of constant depth. We consider d in Theorem a slowly growing 
function of n instead of a constant. Now we can use an Or-reduction better than Lemma l2Tl 
Theorem 1201 provides an Or-reduction n — > 1 with error 0, log-star depth, and size nlogn. 

Lemma 23 There exist constants ci,C2 such that for every d G N, there is an Or-reduction 
n — > 1 with error 0, depth cid + log* n, and size C2dn\og^'^^ n. 

Proof. The same as of Theorem 1221 but use the Or-reduction from Theorem [^U] instead of 
Lemma 1^ in the last layer (for d = 1). The size stays roughly the same, the circuit becomes 
exact, and the depth is increased by an additional term of log* n. □ 

Theorem 24 There is an Or-reduction n ^ 1 with error 0, log-star depth, and linear size. 

Proof. Divide the input into ^ blocks of size log* n. Compute the logical Or of each 
block by a balanced binary tree of depth log (log* n) < log* n and in linear size. Using 
Lemma [231 with d = log* n, compute the logical Or of , ^ new qubits in log-star depth 

iog n 

and size O (log* n ■ ■ log^^^^* = 0(n). □ 



6.3 Approximation of counting and threshold[t] 

In this section, we use the QFT for the parallelisation of increments. This allows us to approx- 
imate the Hamming weight of the input in smaller size O (nlogn). 

Definition 8 The increment gate maps Incr„ : \x) \{x -\- l)mod2"). 

Lemma 25 The increment gate is diagonal in the Fourier basis and its diagonal form is in 
QNC°. 

Proof. Let uj = e^'^*/^" and let \x) be any computational basis state. It is simple to prove the 
following two equations: 

1. Incr„ = F^„DnF2n for diagonal Dn = Y,l=o^ ^'^\y){y\- 



/2" V2" 
2. D = R^ (vr) (7r/2) . . . (7r/2"-i) . 

n n 

D\x) = u;^\x)=lSdu;^"-"^-->=\x^.k) = (Sd{e'^'/^''r"-'^\xr,.k) 



k=l k=l 



fc=l 



I R, (2^2^=) \xn-k) = (i?z (vr) . . . i?z (vr/2"^^)) \x). 
We conclude that Incr = F^DF, and that D is a tensor product of one-qubit operators. □ 
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Remark. The addition of a fixed integer h is as hard as the increment. By Lemma OSl Incr* = 
F^D^F and (R^ (9?))^ = Rz i^b), hence the diagonal version of the addition of b is also in QNC''. 

Theorem 26 Counting can be approximated in constant depth and size O(nlogn). 

Proof. Compute the Hamming weight of the input. Each input qubit controls one increment 
on an m-qubit counter initialised to 0, where m = [log(n +1)]. The increments Incr^ are 
parallelised (Theorem ^ and Lemma I25() . so we apply the quantum Fourier transform F2m 
twice (Theorem inj and the n constant-depth controlled gates in parallel. The size is 
0(poly(m) + nm) = O(nlogn). □ 

Remark, threshold [t] is equal to the most significant qubit of the counter if we align it to a 
power of 2 by adding a fixed integer 2*" — t. exact [t] can be computed by comparing the counter 
with t. 



7 Concluding remarks 

7.1 Comparison with randomised circuits 

Let us compare our results for quantum circuits with similar results for classical randomised 
circuits. We consider randomised circuits with bounded fan- in of Or and And gates, and 
unbounded fan-out and parity (similar to the quantum model). Classical lower bounds are 
folklore and we attach the proofs for the convenience of the reader in Appendix 1^ 



Gate 


Randomised 


Quantum 


Or and threshold [t] exactly 
mod[q] exactly 
Or with error - 

n 

threshold [t] with error ^ 


O(logn) 
O(logn) 
0(log log n) 
i7(log log n) 


0(log* n) 
9(1) 
9(1) 
9(1) 



7.2 Relations of quantum circuit classes 

We have shown that B-QNC^ = B-QAC^ = B-QACC° = B-QTC^* (Theorem E|. If we allow 
arbitrary one-qubit gates, then also QTC? = QAC? C QNCf(log*n) (Theorems UHl and EOl) • 
Several open problems of |GHMP02] have thus been solved. Only little is known about classes 
that do not include the fan-out gate. For example, we do not know whether TC'^ C QTC'^, we 
only know that TC'^ C QTCf . It is simple to prove that parity is in TC''. Take the logical Or 
of exact[l], exact[3], exact[5], . . . , and compute exact[A;] from threshold[A;] and threshold [fc + 1]. 
However, this method needs fan-out to copy the input bits and hence it is not in QTC'^. 



Fang et al. proved |FFG"'"fl3j a lower bound for fan-out. In particular, they showed that 
logarithmic depth is needed to approximate parity using only a constant number of ancillas. 
Unfortunately, their method breaks down with more than a linear number of ancillas and it 
cannot be extended to other unbounded fan- in gates such as majority or threshold [t]. 



7.3 Upper bounds for B-QNCf" 

Shor's original factoring algorithm jSho94j uses modular exponentiation and the quantum 
Fourier transform modulo 2" followed by a polynomial-time deterministic algorithm. The mod- 
ular exponentiation can be replaced by multiplication of some subset of numbers a, a^, a^, 
. . . , a^" |CW00j . The n numbers can be quickly precomputed classically. 
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Since both multiplication of n numbers (Theorem |7|) and the QFT (Theorem ^ are in 
B-QNC?, there is a polynomial-time bounded-error classical algorithm with oracle B-QNCj; 
factoring numbers, i.e. factoring G RP[B-QNCf]. If B-QNC° C BPP,^ then factoring G 
RP[BPP] C BPP[BPP] = BPP. Discrete logarithms can be computed in a similar way using 
modular exponentiation and the quantum Fourier transform modulo general q jSho94j . Since 
QFTq G B-QNC? (Theorem HSI, we conclude that also discrete-log G RP[B-QNC?]. 

7.4 Open problems 

We propose the following open problems on computational aspects of multi-qubit gates: 

i. Is there a constant-depth exact circuit for Or? 

ii. Is there a constant-depth linear-size circuit for Or? 

iii. Are there exact circuits with a fixed basis? 

iv. Can we simulate unbounded fan-out in constant depth using unbounded fan-in gates, e.g. 
threshold [t] or exact [t]? 
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A Lower bounds on classical circuits 



Using the polynomial method |Bei93j . we prove several lower bounds on the depths of determin- 
istic circuits. We consider circuits with fan-in of Or and And gates at most 2, and unbounded 
fan-out and parity, the same as in the quantum model. 

Basically, the value of each bit computed by a circuit can be computed by a multi-linear 
polynomial (over the field Z2) in the input bits. We are interested in the degree of such a 
polynomial; by proving a lower bound on the degree, we also lower-bound the depth of the 
circuit. It is simple to prove that the polynomial computing a Boolean function is unique. 

Each input bit Xk G {0, 1} is computed by the polynomial of degree 1. The Not gate 
computes the polynomial 1 — p{x), where p{x) is the polynomial computing its argument, and 
the degree is unchanged. The And gate computes the polynomial pi{x) ■ P2{x) and the two 
degrees are summed. The parity gate computes the polynomial {pi{x) + . . . +pfc(x))mod2 of 
degree equal to the maximum degree among the arguments. 

Lemma 27 The output of a circuit of depth d has degree at most 2'^. 

Proof. By induction: by adding a new layer, we can at most double the degree when using the 
And gate. □ 

And of n bits is computed by a (unique) polynomial X1X2 ■ ■ - Xn of degree n. Hence every 
circuit computing And has depth at least logn. It is simple to prove by contradiction that 
also Or, threshold [t], and exact [t] have full degree n. Smolensky has proved a much stronger 
result |Smo87j . which implies that also the degree of mod[q] for q > 2 \s n. 

Randomised circuits have access to random bits and may produce the result with a small 
error. Some functions are computed in smaller depth in this model. 

Lemma 28 Or can he computed with one-sided error ^ by a randomised circuit of depth 2. The 
error can he decreased to ^ in additional depth log logn. 

Proof. Take n random bits and output the parity xiri © X2r2 © • • • © Xnrn- If 1 3^1 = 0, then the 
circuit always outputs 0. If |x| > 0, then the probability that the parity is odd is equal to \. 
If we perform the computation (logn)-times using independent random bits, we decrease the 
probability of error to (^)'°^" = ^- This can be done in additional depth log log n by a balanced 
binary tree of Or gates. □ 

By Yao's principle |Yao77j . if we have a randomised circuit with error less than 2~", then 
there exists an assignment of random bits such that the result is always correct. That is there 
exists a deterministic circuit of the same shape. Hence also randomised circuits computing the 
logical Or with exponentially small error have depth at least logn. 

Lemma 29 Every circuit computing Or with error ^ has depth at least log logn. 

Proof. Assume the converse: there exists a circuit of depth d < log logn with error ^. By 

computing the logical Or independently j^^-times, we can reduce the error to (-)'°^" = 2~". 
This can be done in additional depth log = log n — log log n. The total depth of this circuit 
is logn — log logn + d < logn. However, by Yao's principle, the depth has to be at least logn. 

□ 
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